XML Data Fusion
نویسندگان
چکیده
Ensuring high quality data when collecting and integrating information from heterogeneous sources into a data warehouse is a challenging problem. In this paper, we propose a model for XML data fusion, which allows the integrator to define data cleaning rules for solving value conflicts that may have been detected during the integration process. These rules resemble decisions that are made by users when data are manually curated and, once defined, conflicts detected in subsequent integration processes that are within the context of existing rules can be automatically solved without user intervention. We also introduce a notion of fusion policy validation that prevents conflicting resolution rules to be defined. To validate our proposal, we developed XFusion, a rulebased cleaning tool that stores curated data in a integrated repository.
منابع مشابه
Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملRewriting XQuery to Avoid Redundant Expressions based on Static Emulation of XML Store
Rewriting composite expressions based on eliminating intermediate results generated by redundant expressions is a traditional optimization technique (known as fusion) in both programming languages community and database community. In XQuery, composite expressions for node creation are typical in practice, for example, in data integration systems for XML with XQuery as schema mapping. We propose...
متن کاملخوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملRecherche approchée d'information dans une base de documents semi-structurés
RÉSUMÉ. Nous proposons des algorithmes dédiés à l'indexation et à la recherche approximative d'information dans les bases de données hétérogènes semi-structurées XML. Le modèle d'indexation proposé est adapté à la recherche de contenu textuel dans les contextes XML définis par les structures d'arbres. Les mécanismes de recherche approchée mis en œuvre s’appuient sur une distance de Levenshtein ...
متن کامل